Scalable real time metrics management

ABSTRACT

Managing performance metrics includes: obtaining a plurality of performance metrics associated with a plurality of sources on a network; aggregating, at a first rate, the plurality of performance metrics associated with the plurality of sources to generate a plurality of first aggregated results; maintaining at least some of the plurality of first aggregated results in one or more memories; aggregating, at a second rate, the plurality of first aggregated results to generate a plurality of second aggregated results, the second rate being a lower rate than the first rate; and maintaining at least some of the plurality of second aggregated results in the one or more memories.

CROSS REFERENCE TO OTHER APPLICATIONS

This application claims priority to U.S. Provisional Patent ApplicationNo. 62/137,625 entitled REAL TIME METRICS ENGINE filed Mar. 24, 2015which is incorporated herein by reference in its entirety for allpurposes.

BACKGROUND OF THE INVENTION

Metrics (also referred to as performance metrics) are used by computersystems to quantify the measurement of system performance. Metrics arecritical for analyzing systems' operations and providing feedback forimprovements.

In modern computer systems, the quantity of metrics can be large. Forexample, suppose that a single cloud application collects 1000 metricsfor analysis every 5 seconds, which means that 720,000 metrics arecollected every hour. In a typical high scale environment such as anenterprise data center that supports thousands of applications eachexecuting on multiple servers, the rate can be on the order of billionsof metrics per hour.

Currently, most performance monitoring tools save collected metrics to adatabase, then perform analysis offline. These tools tend to scalepoorly because of the high number of input/output (I/O) operations (suchas database reads and writes) required for storing and processing alarge number of metrics. Further, these tools typically do not supportreal time analytics due to the latency and processing overhead instoring and processing metrics data in the database.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of the invention are disclosed in the followingdetailed description and the accompanying drawings.

FIG. 1 is a functional diagram illustrating a programmed computer systemfor managing metrics in accordance with some embodiments.

FIG. 2 is a block diagram illustrating an embodiment of a data centerthat includes a scalable distributed metrics manager.

FIG. 3A is a block diagram illustrating an embodiment of a metricspipeline in a scalable distributed metrics manager.

FIG. 3B is a diagram illustrating an embodiment of a metric datastructure.

FIG. 3C is a diagram illustrating an embodiment of a metrics message.

FIG. 4 is a flowchart illustrating an embodiment of a process formanaging metrics.

FIGS. 5A-5B are diagrams illustrating an embodiment of an approach forarchiving the aggregated results.

FIGS. 6A-6B are diagrams illustrating another embodiment of an approachfor archiving the aggregated results.

FIG. 7 is a flowchart illustrating an embodiment of a process forquerying metrics data stored in a database.

FIG. 8 is a diagram illustrating an example of a query to a databasecomprising multiple time series based database tables.

DETAILED DESCRIPTION

The invention can be implemented in numerous ways, including as aprocess; an apparatus; a system; a composition of matter; a computerprogram product embodied on a computer readable storage medium; and/or aprocessor, such as a processor configured to execute instructions storedon and/or provided by a memory coupled to the processor. In thisspecification, these implementations, or any other form that theinvention may take, may be referred to as techniques. In general, theorder of the steps of disclosed processes may be altered within thescope of the invention. Unless stated otherwise, a component such as aprocessor or a memory described as being configured to perform a taskmay be implemented as a general component that is temporarily configuredto perform the task at a given time or a specific component that ismanufactured to perform the task. As used herein, the term ‘processor’refers to one or more devices, circuits, and/or processing coresconfigured to process data, such as computer program instructions.

A detailed description of one or more embodiments of the invention isprovided below along with accompanying figures that illustrate theprinciples of the invention. The invention is described in connectionwith such embodiments, but the invention is not limited to anyembodiment. The scope of the invention is limited only by the claims andthe invention encompasses numerous alternatives, modifications andequivalents. Numerous specific details are set forth in the followingdescription in order to provide a thorough understanding of theinvention. These details are provided for the purpose of example and theinvention may be practiced according to the claims without some or allof these specific details. For the purpose of clarity, technicalmaterial that is known in the technical fields related to the inventionhas not been described in detail so that the invention is notunnecessarily obscured.

Managing metrics for high scale environments is disclosed. In someembodiments, the metrics are managed and processed in a pipelinecomprising multiple stages. A plurality of performance metricsassociated with a plurality of sources on a network is obtained. Theplurality of performance metrics is aggregated at a first rate togenerate a plurality of first aggregated results, and at least some ofthe plurality of first aggregated results are maintained for a time inone or more memories. The plurality of first aggregated results isaggregated at a second rate to generate a plurality of second aggregatedresults, the second rate being a lower rate than the first rate. Atleast some of the plurality of second aggregated results are maintainedin the one or more memories. Additional aggregation stages can be used.The aggregated results can be persisted to a persistent storage.

FIG. 1 is a functional diagram illustrating a programmed computer systemfor managing metrics in accordance with some embodiments. As will beapparent, other computer system architectures and configurations can beused to manage and process metrics. Computer system 100, which includesvarious subsystems as described below, includes at least onemicroprocessor subsystem (also referred to as a processor or a centralprocessing unit (CPU)) 102. For example, processor 102 can beimplemented by a single-chip processor or by multiple processors. Insome embodiments, processor 102 is a general purpose digital processorthat controls the operation of the computer system 100. Usinginstructions retrieved from memory 110, the processor 102 controls thereception and manipulation of input data, and the output and display ofdata on output devices (e.g., display 118). In some embodiments,processor 102 includes and/or is used to provide server functionsdescribed below with respect to server 202, etc. of FIG. 2.

Processor 102 is coupled bi-directionally with memory 110, which caninclude a first primary storage, typically a random access memory (RAM),and a second primary storage area, typically a read-only memory (ROM).As is well known in the art, primary storage can be used as a generalstorage area and as scratch-pad memory, and can also be used to storeinput data and processed data. Primary storage can also storeprogramming instructions and data, in the form of data objects and textobjects, in addition to other data and instructions for processesoperating on processor 102. Also as is well known in the art, primarystorage typically includes basic operating instructions, program code,data, and objects used by the processor 102 to perform its functions(e.g., programmed instructions). For example, memory 110 can include anysuitable computer-readable storage media, described below, depending onwhether, for example, data access needs to be bi-directional oruni-directional. For example, processor 102 can also directly and veryrapidly retrieve and store frequently needed data in a cache memory (notshown).

A removable mass storage device 112 provides additional data storagecapacity for the computer system 100, and is coupled eitherbi-directionally (read/write) or uni-directionally (read only) toprocessor 102. For example, storage 112 can also includecomputer-readable media such as magnetic tape, flash memory, PC-CARDS,portable mass storage devices, holographic storage devices, and otherstorage devices. A fixed mass storage 120 can also, for example, provideadditional data storage capacity. The most common example of massstorage 120 is a hard disk drive. Mass storages 112, 120 generally storeadditional programming instructions, data, and the like that typicallyare not in active use by the processor 102. It will be appreciated thatthe information retained within mass storages 112 and 120 can beincorporated, if needed, in standard fashion as part of memory 110(e.g., RAM) as virtual memory.

In addition to providing processor 102 access to storage subsystems, bus114 can also be used to provide access to other subsystems and devices.As shown, these can include a display monitor 118, a network interface116, a keyboard 104, and a pointing device 106, as well as an auxiliaryinput/output device interface, a sound card, speakers, and othersubsystems as needed. For example, the pointing device 106 can be amouse, stylus, track ball, or tablet, and is useful for interacting witha graphical user interface.

The network interface 116 allows processor 102 to be coupled to anothercomputer, computer network, or telecommunications network using anetwork connection as shown. For example, through the network interface116, the processor 102 can receive information (e.g., data objects orprogram instructions) from another network or output information toanother network in the course of performing method/process steps.Information, often represented as a sequence of instructions to beexecuted on a processor, can be received from and outputted to anothernetwork. An interface card or similar device and appropriate softwareimplemented by (e.g., executed/performed on) processor 102 can be usedto connect the computer system 100 to an external network and transferdata according to standard protocols. For example, various processembodiments disclosed herein can be executed on processor 102, or can beperformed across a network such as the Internet, intranet networks, orlocal area networks, in conjunction with a remote processor that sharesa portion of the processing. Additional mass storage devices (not shown)can also be connected to processor 102 through network interface 116.

An auxiliary I/O device interface (not shown) can be used in conjunctionwith computer system 100. The auxiliary I/O device interface can includegeneral and customized interfaces that allow the processor 102 to sendand, more typically, receive data from other devices such asmicrophones, touch-sensitive displays, transducer card readers, tapereaders, voice or handwriting recognizers, biometrics readers, cameras,portable mass storage devices, and other computers.

In addition, various embodiments disclosed herein further relate tocomputer storage products with a computer readable medium that includesprogram code for performing various computer-implemented operations. Thecomputer-readable medium is any data storage device that can store datawhich can thereafter be read by a computer system. Examples ofcomputer-readable media include, but are not limited to, all the mediamentioned above: magnetic media such as hard disks, floppy disks, andmagnetic tape; optical media such as CD-ROM disks; magneto-optical mediasuch as optical disks; and specially configured hardware devices such asapplication-specific integrated circuits (ASICs), programmable logicdevices (PLDs), and ROM and RAM devices. Examples of program codeinclude both machine code, as produced, for example, by a compiler, orfiles containing higher level code (e.g., script) that can be executedusing an interpreter.

The computer system shown in FIG. 1 is but an example of a computersystem suitable for use with the various embodiments disclosed herein.Other computer systems suitable for such use can include additional orfewer subsystems. In addition, bus 114 is illustrative of anyinterconnection scheme serving to link the subsystems. Other computerarchitectures having different configurations of subsystems can also beutilized.

FIG. 2 is a block diagram illustrating an embodiment of a data centerthat includes a scalable distributed metrics manager. In this example,client devices such as 252 connect to a data center 250 via a network254. A client device can be a laptop computer, a desktop computer, atablet, a mobile device, a smart phone, a wearable networking device, orany other appropriate computing device. In some embodiments, a webbrowser and/or a standalone client application is installed at eachclient, enabling a user to use the client device to access certainapplications hosted by data center 250. Network 254 can be the Internet,a private network, a hybrid network, or any other communicationsnetwork.

In the example shown, a networking layer 255 comprising networkingdevices such as routers, switches, etc. forwards requests from clientdevices 252 to a distributed network service platform 204. In thisexample, distributed network service platform 204 includes a number ofservers configured to provide a distributed network service. A physicalserver (e.g., 202, 204, 206, etc.) has hardware components and softwarecomponents, and may be implemented using a device such as 100. In thisexample, hardware (e.g., 208) of the server supports operating systemsoftware in which a number of virtual machines (VMs) (e.g., 218, 219,220, etc.) are configured to execute. A VM is a software implementationof a machine (e.g., a computer) that simulates the way a physicalmachine executes programs. The part of the server's operating systemthat manages the VMs is referred to as the hypervisor. The hypervisorinterfaces between the physical hardware and the VMs, providing a layerof abstraction to the VMs. Through its management of the VMs' sharing ofthe physical hardware resources, the hypervisor makes it appear asthough each VM were running on its own dedicated hardware. Examples ofhypervisors include the VMware Workstation® and Oracle VM VirtualBox®.Although physical servers supporting VM architecture are shown anddiscussed extensively for purposes of example, physical serverssupporting other architectures such as container-based architecture(e.g., Kubernetes®, Docker®, Mesos®), standard operating systems, etc.,can also be used and techniques described herein are also applicable. Ina container-based architecture, for example, the applications areexecuted in special containers rather than virtual machines.

In some embodiments, instances of applications are configured to executewithin the VMs. Examples of such applications include web applicationssuch as shopping cart, user authentication, credit card authentication,email, file sharing, virtual desktops, voice/video streaming, onlinecollaboration, and many others.

One or more service engines (e.g., 214, 224, etc.) are instantiated on aphysical device. In some embodiments, a service engine is implemented assoftware executing in a virtual machine. The service engine is executedto provide distributed network services for applications executing onthe same physical server as the service engine, and/or for applicationsexecuting on different physical servers. In some embodiments, theservice engine is configured to enable appropriate service componentsthat implement service logic. For example, a load balancer component isexecuted to provide load balancing logic to distribute traffic loadamongst instances of applications executing on the local physical deviceas well as other physical devices; a firewall component is executed toprovide firewall logic to instances of the applications on variousdevices; a metrics agent component is executed to gather metricsassociated with traffic, performance, etc. associated with the instancesof the applications, etc. Many other service components may beimplemented and enabled as appropriate. When a specific service isdesired, a corresponding service component is configured and invoked bythe service engine to execute in a VM.

In the example shown, traffic received on a physical port of a server(e.g., a communications interface such as Ethernet port 215) is sent toa virtual switch (e.g., 212). In some embodiments, the virtual switch isconfigured to use an API provided by the hypervisor to interceptincoming traffic designated for the application(s) in an inline mode,and send the traffic to an appropriate service engine. In inline mode,packets are forwarded on without being replicated. As shown, the virtualswitch passes the traffic to a service engine in the distributed networkservice layer (e.g., the service engine on the same physical device),which transforms the packets if needed and redirects the packets to theappropriate application. The service engine, based on factors such asconfigured rules and operating conditions, redirects the traffic to anappropriate application executing in a VM on a server. Details of thevirtual switch and its operations are outside the scope of the presentapplication.

Controller 290 is configured to control, monitor, program, and/orprovision the distributed network services and virtual machines. Inparticular, the controller includes a metrics manager 292 configured tocollect performance metrics and perform analytical operations. Thecontroller can be implemented as software, hardware, firmware, or anycombination thereof. In some embodiments, the controller is implementedon a system such as 100. In some cases, the controller is implemented asa single entity logically, but multiple instances of the controller areinstalled and executed on multiple physical devices to provide highavailability and increased capacity. In embodiments implementingmultiple controllers, known techniques such as those used in distributeddatabases are applied to synchronize and maintain coherency of dataamong the controller instances.

Within data center 250, one or more controllers 290 gather metrics datafrom various nodes operating in the data center. As used herein, a noderefers to a computing element that is a source of metrics information.Examples of nodes include virtual machines, networking devices, serviceengines, or any other appropriate elements within the data center.

Many different types of metrics can be collected by the controller. Forexample, since traffic (e.g., connection requests and responses, etc.)to and from an application will pass through a corresponding serviceengine, metrics relating to the performance of the application and/orthe VM executing the application can be directly collected by thecorresponding service engine. As another example, to collect metricsrelating to client responses, a service engine sends a script to aclient browser or client application. The script measures clientresponses and returns one or more collected metrics back to the serviceengine. In both cases, the service engine sends the collected metrics tocontroller 290. Additionally, infrastructure metrics relating to theperformance of other components of the service platform (e.g., metricsrelating to the networking devices, metrics relating to the performanceof the service engines themselves, metrics relating to the host devicessuch as data storage as well as operating system performance, etc.) canbe collected by the controller. Specific examples of the metrics includeround trip time, latency, bandwidth, number of connections, etc.

The components and arrangement of distributed network service platform204 described above are for purposes of illustration only. The techniquedescribed herein is applicable to network service platforms havingdifferent components and/or arrangements.

FIG. 3A is a block diagram illustrating an embodiment of a metricspipeline in a scalable distributed metrics manager. Pipeline 300implements the process for aggregating metrics and can be used toimplement scalable metrics manager 292. A pipeline processes one or morespecific types of metrics, and multiple pipelines similar to 300 can beconfigured to process different types of metrics. In this example,pipeline 300 receives metrics from a variety of sources, such as serviceengines 214, 224, etc., via data streams. Metrics can also be receivedfrom other sources such as network devices, an operating system, avirtual switch, etc. (not shown). The performance metrics arecontinuously collected at various sources (e.g., service engines,network devices, etc.) and sent to the first stage (e.g., stage 302 ofFIG. 3A) of the pipeline. The rate at which a metric is generated isarbitrary and can vary for different sources. For example, one serviceengine can generate metrics at a rate of 1 metric/second, while anotherservice engine can generate metrics at a rate of 2 metrics/second.

Pipeline 300 comprises multiple stages aggregating metrics at differentrates. In particular, the first stage aggregates raw metrics, and eachsuccessive stage aggregates the outputs from the previous stage at alower rate (or equivalently, a coarser granularity of time or lowerfrequency). In the example shown, three stages are used: stage 302aggregates metrics from their sources every 5 seconds, stage 304aggregates the results of stage 302 every 5 minutes, and stage 306aggregates the results of stage 304 aggregated every hour. Differentnumbers of stages and/or aggregation rates can be used in otherembodiments. The metrics pipeline is implemented in memory to allow forfast access and analytical operations. Each stage only needs to maintaina sufficient number of inputs (e.g., metrics or results from theprevious stage) in memory to perform aggregation, thus the overallnumber of metrics to be stored and the total amount of memory requiredfor real time analysis are reasonable and can be implemented for highscale environments such as enterprise data centers where large volumesof metrics are constantly generated. Note that although separate buffersare shown for the output of one stage and the input of the next stage,in some implementations only one set of buffers needs to be maintained.As will be described in greater detail below, each stage performs one ormore aggregation functions to generate aggregated results. Further, eachof the pipeline stages is optionally connected to a persistent storage310, such as a database, a file system, or any other appropriatenon-volatile storage system, in order to write the aggregated results tothe storage and back up the metrics data more permanently. For example,MongoDB is used in some implementations. Further details of thepipeline's operations are explained in connection with FIG. 4 below.

FIG. 3B is a diagram illustrating an embodiment of a metric datastructure. In this example, metric 350 is a key-tuple data structurethat includes the following fields: {MetricsObjectType, Entity, Node,ObjectID}. Depending on implementation, the values in the fields can bealphanumeric strings, numerical values, or any other appropriate dataformats. MetricsObjectType specifies the type of metric being sent.Examples of MetricsObjectType include client metric, front end networkmetric, backend network metric, application metric, etc. Entityspecifies the particular element about which the metric is beingreported, such as a particular server or application executing on avirtual machine. Node specifies the particular element that is reportingthe metric, such as a particular service engine. An Entity may includemultiple objects, and ObjectID specifies the particular object withinthe entity that is generating the metric, such as a particular InternetProtocol (IP) address, a particular port, a particular UniversalResource Identifier (URI), etc. In the example shown, MetricsObjectTypeis set to vserver 14 client (which corresponds to a type of metricrelated to virtual server layer 4 client), Entity is set to vs-1 (whichcorresponds to a server with the identifier of vs-1), Node is set tose-1 (which corresponds to a service engine with the identifier ofse-1), and ObjectID is set to port (which corresponds to the port objectwithin the server). When a metric is stored to the database, each fieldcan be used as an index for lookups. Metrics with different fields canbe defined and used. For example, in one implementation, a metric alsoincludes a timestamp field.

In some embodiments, a source can report metrics to the metrics managerwithout requiring explicit registration with the metrics manager. Asource can report metrics to the metrics manager by sending one or moremessages having predetermined formats. FIG. 3C is a diagram illustratingan embodiment of a metrics message. The message includes a header thatspecifies certain characteristics of the metrics being sent (e.g.,number of metrics in the message, timestamp of when the message is sent,etc.), and multiple metrics in the message body. In this example, thenode batches multiple metrics in a single message and sends the messageto the metrics manager.

In some implementations, the metrics manager maintains multiplepipelines to process different types of metrics. Upon receiving ametrics message, the metrics manager parses the message to obtainmetrics and places each metric in an appropriate pipeline forprocessing. In some embodiments, upon detecting that a metric includes anew instance of key-tuples as discussed above, the metrics managerestablishes a new in-memory processing unit (e.g., a specific pipelinesuch as 300 that is configured with its own memory, thread, and/orprocess for handling metrics associated with the key-tuple), and futuremetrics messages having the same key-tuple will be processed by thisin-memory processing unit. In some embodiments, the metrics manager canestablish one or more pipelines that receive as inputs multiplekey-tuple in order to generate certain desired results. Theconfiguration of specific pipelines depends on implementation.

FIG. 4 is a flowchart illustrating an embodiment of a process formanaging metrics. Process 400 can be implemented by scalable metricsmanager 292 operating on a system such as 100.

At 401, metrics associated with a plurality of sources are obtained. Asdiscussed above, the metrics can be sent in messages to the metricsmanager.

The obtained metrics are managed in a metrics pipeline as describedabove in connection with FIG. 3A.

Specifically, at 402, the metrics associated with a plurality of sourcesare aggregated at a first rate to generate a plurality of firstaggregated results.

In some cases, aggregation includes applying one or more transformoperations (also referred to as aggregation operations) that transformthe received metrics to generate new metrics. For example, suppose thatfour instances of a particular application periodically generate a setof four metrics reporting the number of connections to each applicationinstance. One or more aggregation functions (F1) can be performed tocombine (e.g., add) the four metrics to generate a new aggregated resultof the total number of connections, average the four metrics to generatea new aggregated result of average number of connections, determine theminimum and/or maximum number of connections among the four metrics,compute the difference between the maximum number of connections and theminimum number of connections, etc. Many other aggregation/transformfunctions are possible for various metrics manager implementations. Insome cases, the raw metrics are sampled at the first rate to generatethe aggregated results. More commonly, a transform operation generates acorresponding aggregated result (also referred to as derived metric)based on the inputs to the transform function. Multiple transformoperations can generate a vector of aggregated results. Aggregations canbe performed across service engines, across multiple servers in a pool,across multiple entities, across multiple objects, etc. A pipeline canbe configured to transform any appropriate metrics into a new result.The specific aggregation functions in a pipeline can be configured bythe programmer or administrator according to actual system needs.

The first rate at which aggregation takes place corresponds to the rateat which the aggregation function is performed on the collected data. Inthe following examples, a constant rate is discussed extensively forpurposes of example, but the rate can be a non-constant rate as well(e.g., aggregation happens when the number of metrics collected meets orexceeds a threshold or when some other triggering condition foraggregation is met). Because the aggregation only uses metrics stored inmemory, it does not require any database calls and is highly efficient.Further, because the aggregation is done periodically and in a batchedfashion (e.g., all first stages of the pipelines perform aggregationevery 5 seconds), timers do not need to be maintained per object or permetric. Thus, aggregation can be performed quickly and efficiently.

The received metrics are temporarily maintained in a memory such as aRandom Access Memory (RAM). The first aggregated results and/or thereceived metrics will be rolled up into the next stage periodically. Thefirst stage only needs to maintain a sufficient amount of input metricsin the memory until the first aggregation is performed, after which themetrics used in the aggregation can be removed from the memory in orderto save space and make room for new aggregated results and/or metrics.Before being deleted from the memory, the first aggregated resultsand/or the obtained metrics are optionally output to a persistentstorage such as a database. In some embodiments, the aggregation of theperformance metrics and the maintenance of the first aggregated resultsare performed in a single process to reduce the overhead of contextswitches. It is permissible to implement the aggregation and themaintenance steps in separate processes.

As will be described in greater detail below, analytical operation,event detection and generation, as well as storing the aggregationresults and/or the metrics to a persistent storage can be performed.

At 404, the first aggregated results are aggregated at a second rate togenerate a plurality of second aggregated results. This is also referredto as a roll-up operation. In this case, the second rate (which can alsobe constant or non-constant) is on average lower than the first rate,and the aggregation function performed in the second stage is notnecessarily the same as the aggregation function performed in the firststage. Referring again to the example shown in FIG. 3A, aggregatedresults of the first stage (stage 302) are sent to the second stage(stage 304), to be aggregated at a rate of every 5 minutes. Suppose thatthe first stage generates an aggregated result every 5 seconds, then inevery 5 minutes there will be 60 first aggregated results. These 60first aggregated results from the first stage are aggregated again atthe second stage according to the one or more aggregation functions (F2)specified in the second stage to generate one or more second aggregatedresults.

Similar to the first aggregated results, the second aggregated resultsare maintained in memory for a time. A sufficient number of the secondaggregated results is maintained in the memory for the third stage ofaggregation to be performed. The second aggregated results areoptionally output to a persistent data store. After the secondaggregated results are aggregated in the third stage, those secondaggregated results that are used by the third stage for aggregation canbe deleted from memory to save space. One or more analytical operationscan be performed on the second aggregated results. Event detection andgeneration can also be performed on the second aggregated results. Theseoperations are preferably implemented as inline operations of themetrics manager.

At 406, the plurality of second aggregated results is aggregated at athird rate to generate a plurality of third aggregated results.Referring again to the example shown in FIG. 3A, the aggregation resultsof the second stage (stage 304) are sent to the third stage (stage 306),to be aggregated at a rate of every hour. Suppose that the second stagegenerates an aggregated result every 5 minutes, then in one hour therewill be 12 aggregated results. These 12 aggregated results from thesecond stage are aggregated again at the third stage according to one ormore aggregation functions (F3) associated with the third stage.

Although three stages are shown for purposes of illustration, othernumbers of stages (e.g., two stages, four, or more stages) can beimplemented in various embodiments.

In the above process, at each stage, the metrics manager can invoke oneor more corresponding analytical operations (such as anomaly detection)on the aggregated results. For example, the Holt-Winters algorithm usedto detect outlier metrics and remove anomalies can be performed at anyof the stages. As another example, an aggregated result (e.g., the totalnumber of connections) is compared with a threshold (e.g., a maximumnumber of 100) to detect if the threshold has been exceeded. Manyanalytical operations are possible and can be configured by theprogrammer or administrator according to actual system needs.Preferably, the analytical operation is implemented as an inlinefunction within the metrics manager process. The inline functionimplements additional steps in the continuous processing of the metricswithin the same process and software module. Because the aggregatedresults are kept in memory rather than streamed to a database andbecause the analytical operation is inline, the analytical operationdoes not require any input/output (I/O) operations such as databaseread/write, inter-process communication, system call, messaging, etc.Thus, the analytical operations are highly efficient compared withexisting analytics tools that typically require database access or filesystem access. The analytical operations can be performed in real time(e.g., at substantially the same time as when the performance metricsare received, or at substantially the same time as when the aggregatedresults are generated).

In some embodiments, the metrics manager generates events when metricsand/or aggregated results meet certain conditions. Event detection andgeneration can be implemented as an inline function where certainconditions are tested on a per metric type, per entity, and per nodebasis. For example, if the network connections metrics of a server sentby a service engine indicate that the connection exceeds a threshold,then an event such as an alarm or log is triggered. Other examples ofevent triggering conditions include: a metric meeting or exceeding ahigh watermark level for the first time after the metric has stayedbelow a low threshold; a metric meeting or falling below a low watermarklevel after the metric has stayed above the threshold; a metric crossinga predefined threshold; a metric indicating that an anomaly hasoccurred, etc. Many conditions are possible, and in some embodiments, aset of rules is specified for these conditions, and a rules processingengine compares the values associated with metrics against the rules todetect whether any specified conditions are met.

As discussed above, in some implementations metrics and aggregatedresults are recorded in a persistent storage such as a database forbackup purposes. A retention policy is specified by the administrator todetermine the amount of time for which corresponding stored data remainsin the database. When the retention period is over, any data that isoutside the retention policy period is erased from the database toconserve space.

FIGS. 5A-5B are diagrams illustrating an embodiment of an approach forarchiving the aggregated results. In this example, aggregated resultsare written to the database at the time of aggregation, as shown in FIG.5A. The retention policy periods for the first stage, the second stage,and the third stage are 2 hours, 2 days, and 1 year, respectively. Thus,after 2 hours, the database records corresponding to the firstaggregated results that occurred before the current two hour window aredeleted from the database, as shown in FIG. 5B. As can be seen, becausethe data from different stages is interspersed, deleting recordsassociated with a particular stage can leave “holes” in the database andwill slow down the query of the aggregated results, negatively impactingthe database's write performance, and ultimately degrading the rate ofaggregation.

To overcome the problem illustrated in FIGS. 5A-5B, time series baseddatabase tables are used in some embodiments. FIGS. 6A-6B are diagramsillustrating another embodiment of an approach for archiving theaggregated results. In this example, aggregated results from differentstages of a pipeline occupy separate tables. Within a stage, multipletables can be used to store the aggregated results. These tables arereferred to as time series based database tables since they eachcorrespond to a different period of aggregated results. Different timeseries based database tables can be subject to different retentionpolicies. In this example, two tables are used to store the firstaggregated results, where each table is configured to store one hour'sworth of first aggregated results from the first stage; one table isused to store the second aggregated results, where the table isconfigured to store one day's worth of second aggregated results fromthe second stage; and one table is used to store the third aggregatedresults, where the table is configured to store one year's worth ofthird aggregated results from the third stage.

The aggregated results are written to the database in a batch in appendmode. For example, the first aggregated results can be written to thedatabase every 30 minutes rather than every five seconds. Maintaining agreater amount of aggregated results in memory permits less frequentdatabase writes, which is more efficient. Thus, the rate at which theaggregated results are written can be configured based on tradeoffs ofmemory required to keep the aggregated results and efficiency ofdatabase writes. Further, the aggregated results are written to thedatabase in tables according to the retention period of thecorresponding retention policy.

Note that the table size does not need to exactly correspond to theamount of data generated during the retention period but can be on thesame order of magnitude. Suppose the retention period for the firststage is two hours. Table 602 is initially filled with first aggregatedresults obtained during the first hour, and table 604 is initiallyfilled with first aggregated results obtained during the second hour. Inthis example, at the end of two hours, some of the old aggregatedresults need to be removed to make room for new aggregated results.Thus, for data in the next two hour window, the entire contents of table602 is deleted, and table 604 now stores aggregated data for the firsthour, and table 602 is used to store aggregated data for the secondhour. Because aggregated results are stored in separate tables anddeleted separately, holes in the database are avoided.

Because the database uses the time series based database tables used tostore aggregated results, when the database is queried, the query willnot necessarily be performed on a single table. Thus, in someembodiments, the metric manager provides a query application programminginterface (API) that hides the details of the underlying time seriesbased database table and gives the appearance of making query to andreceiving results from a single table.

FIG. 7 is a flowchart illustrating an embodiment of a process forquerying metrics data stored in a database. Process 700 can be performedby the metrics manager in response to a database query, which can beinitiated manually by a user via a user interface tool provided by aperformance monitoring application, automatically by the performancemonitoring application, etc.

At 702, the database query is analyzed to determine one or morecorresponding time series based database tables associated with thedatabase query. Specifically, the time window of the query is comparedwith the time windows of the time series based database tables.

At 704, it is determined whether the time window being queried spansonly a single time series based database table. If so, the databasequery is performed normally without changes to the query, at 706. If,however, the time window being queried spans multiple time series baseddatabase tables, then the particular time series based database tablesare determined and the process continues at 708.

At 708, the database query is converted into a union of multiplesub-queries across the determined time series based database tables.

At 710, filters from the database query are applied to the sub-queriessuch that the database's efficient filtering can be used optimally. Theefficiency of filtering is gained as filters are applied on a per tablebasis before the results are joined together. Thus, the time complexityof filtering becomes K (the max number of rows in any table) instead ofN (the number of combined rows across tables), where K<<N.

At 712, the sub-queries are performed on the database.

At 714, the responses to the sub-queries are combined into a singleresponse.

This way, to the generator of the query (e.g., the performancemonitoring application), it appears as if the query were performed on asingle table.

FIG. 8 is a diagram illustrating an example of a query to a databasecomprising multiple time series based database tables. In FIG. 8, aplurality of database tables is used to store aggregated metrics fromvarious stages of the pipeline. In particular, tables 802 and 804 areshown to store the first hour of the first aggregated results and thesecond hour of the first aggregated results, respectively. As shown,table 802 stores metrics gathered between 21:00:00-22:59:55 and table804 stores metrics gathered between 22:00:00-22:59:55. Metrics in bothtables are gathered in 5-second increments.

Suppose the following database query is made to query the database:

-   SELECT se_stats_table.metric_timestamp AS    se_stats_table_metric_timestamp, se_stats_table.avg_cpu_usage AS    se_stats_table_avg_cpu_usage, entity_table.entity_id AS entity_id-   FROM se_stats_table JOIN entity_table ON    entity_table.entity_key=se_stats_table.entity_key WHERE    se_stats_table.metric_timestamp>=‘2015-03-19T21:03:25’ AND    se_stats_table.metric_timestamp<=‘2015-03-19T22:03:20’ AND    se_stats_table.metric_period=‘5SECOND’ AND    entity_table.entity_id=‘se-1’

Referring to FIG. 7, at 702, the database query is analyzed and it isdetermined that there are two time series based database tables (802 and804) that correspond to the database query.

At 708 and 710, the database query is converted into a union of twosub-queries, and filters are applied to the sub-queries. In thisexample, the sub-queries correspond to their respective database tables.The sub-query that spans the time window of ‘2015-03-19T21:03:25’ to‘2015-03-19T21:59:55’ is:

-   SELECT se_stats_table_1hour_396333.metric_timestamp AS    se_stats_table_1hour_396333_metric_timestamp,    se_stats_table_1hour_396333.avg_cpu_usage AS    se_stats_table_1hour_396333_avg_cpu_usage, entity_table.entity_id AS    entity_id-   FROM se_stats_table_1hour_396333 JOIN entity_table ON    entity_table.entity_key=se_stats_table_1hour_396333.entity_key-   WHERE se_stats_table_1hour    396333.metric_timestamp>=‘2015-03-19T21:03:25’ AND    se_stats_table_1hour 396333.metric_timestamp ‘2015-03-19T21:59:55’    AND se_stats_table_1hour 396333.metric_period=‘5SECOND’ AND    entity_table.entity_id=‘se-1’

The sub-query that spans the time window of 2015-03-19T22:00:00 to‘2015-03-19T22:03:20’ is:

-   SELECT se_stats_table_1hour 396334.metric_timestamp AS    se_stats_table_1hour 396334_metric_timestamp, se_stats_table_1hour    396334.avg_cpu_usage AS se_stats_table_1hour 396334_avg_cpu_usage,    entity_table.entity_id AS entity_id-   FROM se_stats_table_1hour 396334 JOIN entity_table ON    entity_table.entity_key=se_stats_table_1hour 396334.entity_key-   WHERE se_stats_table_1hour    396334.metric_timestamp>=‘2015-03-19T22:00:00’ AND    se_stats_table_1hour 396334.metric_timestamp ‘2015-03-19T22:03:20’    AND se_stats_table_1hour 396334.metric_period=‘5SECOND’ AND    entity_table.entity_id=‘se-1’

The union of the sub-queries with filters is:

-   SELECT anon_1.se_stats_table_1hour 396333_metric_timestamp AS    metric_timestamp, anon_1.se_stats_table_1hour_396333_avg_cpu_usage    AS avg_cpu_usage, anon_1.entity_id AS entity_id-   FROM (SELECT se_stats_table_1hour 396333.metric_timestamp AS    se_stats_table_1hour_396333_metric_timestamp,    se_stats_table_1hour_396333.avg_cpu_usage AS    se_stats_table_1hour_396333_avg_cpu_usage, entity_table.entity_id AS    entity_id FROM se_stats_table_1hour_396333 JOIN entity_table ON    entity_table.entity_key=se_stats_table_1hour 396333.entity_key-   WHERE se_stats_table_1hour    396333.metric_timestamp>=‘2015-03-19T21:03:25’ AND    se_stats_table_1hour_396333.metric_timestamp<=‘2015-03-19T21:59:55’    AND se_stats_table_1hour 396333.metric_period=‘5SECOND’ AND    entity_table.entity_id=‘se-1’ UNION ALL SELECT    se_stats_table_1hour_396334.metric_timestamp AS se_stats_table_1hour    396334_metric_timestamp, se_stats_table_1hour 396334.avg_cpu_usage    AS se_stats_table_1hour 396334 avg_cpu_usage, entity_table.entity_id    AS entity_id FROM se_stats_table_1hour 396334 JOIN entity_table ON    entity_table.entity_key=se_stats_table_1hour 396334.entity_key-   WHERE se_stats_table_1hour    396334.metric_timestamp>=‘2015-03-19T22:00:00’ AND    se_stats_table_1hour 396334.metric_timestamp ‘2015-03-19T22:03:20’    AND se_stats_table_1hour 396334.metric_period=‘5SECOND’ AND    entity_table.entity_id=‘se-1’) AS anon_1-   ORDER BY anon 1.se_stats_table_1hour 396333_metric_timestamp LIMIT    720

The rearrangement of the query across multiple time series tables shownabove does not compromise the performance of read operations to thedatabase, and facilitates efficient write operations to the database bythe metrics manager.

Managing performance metrics has been disclosed. By processing themetrics in a pipeline in memory, the technique described abovesignificantly reduces the amount of I/O operations and latencyassociated with processing the metrics, and allows for real timeanalytics.

Although the foregoing embodiments have been described in some detailfor purposes of clarity of understanding, the invention is not limitedto the details provided. There are many alternative ways of implementingthe invention. The disclosed embodiments are illustrative and notrestrictive.

What is claimed is: 1-21. (canceled)
 22. A method of analyzing metricdata sets associated with a set of elements in a network, the methodcomprising: aggregating, at a first rate, a plurality of metric datasets associated with the set of network elements to generate a pluralityof first aggregated results; aggregating, at a second rate, theplurality of first aggregated results to generate a plurality of secondaggregated results, the second rate being a lower rate than the firstrate; and analyzing the plurality of second aggregated results in orderto monitor the set of network elements, said first and secondaggregation operations performed to reduce amount of memory used tostore metric data sets by producing aggregated results for saidanalyzing operation.
 23. The method of claim 22 further comprisinganalyzing the plurality of first aggregated results in order to monitorperformance of the set of network elements.
 24. The method of claim 22,wherein the pluralities of the first and second aggregated results arestored in memory, and the analyzing comprises performing fast analyticaloperations on the plurality of the second aggregated results stored inmemory in order to monitor the set of network elements.
 25. The methodof claim 24, wherein the analyzing further comprises performing fastevent detection operations on the plurality of second aggregated resultsstored in memory in order to identify events associated with the set ofnetwork elements.
 26. The method of claim 24 further comprising storingthe pluralities of first and second aggregated results to one or moredatabase for subsequent queries.
 27. The method of claim 22, wherein theanalyzing comprises: performing an analytical operation on the pluralityof second aggregated results to monitor the set of network elements; andperforming event detection operation on the plurality of secondaggregated results to identify events associated with the set of networkelements.
 28. The method of claim 27, wherein the analyzing comprises:performing an analytical operation on the plurality of first aggregatedresults to monitor the set of network elements; and performing eventdetection operation on the plurality of first aggregated results toidentify events associated with the set of network elements.
 29. Themethod of claim 21 further comprising collecting the plurality of metricdata sets from a plurality of sources in the network that collect metricdata at different rates.
 30. The method of claim 21 further comprisingstoring the pluralities of the first and second aggregated results inmemory; aggregating, at a third rate, the plurality of second aggregatedresults to generate a plurality of third aggregated results, the thirdrate being a lower rate than the first and second rates; and analyzingthe plurality of third aggregated results in order to monitor the set ofnetwork elements.
 31. A non-transitory computer readable medium storinga program for analyzing metric data sets associated with a set ofelements in a network, the program executable by a processing unit, theprogram comprising sets of instructions for: aggregating, at a firstrate, a plurality of metric data sets associated with the set of networkelements to generate a plurality of first aggregated results;aggregating, at a second rate, the plurality of first aggregated resultsto generate a plurality of second aggregated results, the second ratebeing a lower rate than the first rate; and analyzing the plurality ofsecond aggregated results in order to monitor the set of networkelements, said first and second aggregation operations performed toreduce amount of memory used to store metric data sets by producingaggregated results for said analyzing operation.
 32. The non-transitorycomputer readable medium of claim 31, the program further comprising aset of instructions for analyzing the plurality of first aggregatedresults in order to monitor performance of the set of network elements.33. The non-transitory computer readable medium of claim 31, wherein thepluralities of the first and second aggregated results are stored inmemory, and the set of instructions for analyzing comprises a set ofinstructions for performing fast analytical operations on the pluralityof the second aggregated results stored in memory in order to monitorthe set of network elements.
 34. The non-transitory computer readablemedium of claim 33, wherein the set of instructions for analyzingfurther comprises a set of instructions for performing fast eventdetection operations on the plurality of second aggregated resultsstored in memory in order to identify events associated with the set ofnetwork elements.
 35. The non-transitory computer readable medium ofclaim 33, the program further comprising a set of instructions forstoring the pluralities of first and second aggregated results to one ormore database for subsequent queries.
 36. The non-transitory computerreadable medium of claim 31, wherein the set of instructions foranalyzing comprises sets of instructions for: performing an analyticaloperation on the plurality of second aggregated results to monitor theset of network elements; and performing event detection operation on theplurality of second aggregated results to identify events associatedwith the set of network elements.
 37. The non-transitory computerreadable medium of claim 36, wherein the set of instructions foranalyzing comprises sets of instructions for: performing an analyticaloperation on the plurality of first aggregated results to monitor theset of network elements; and performing event detection operation on theplurality of first aggregated results to identify events associated withthe set of network elements.
 38. The non-transitory computer readablemedium of claim 31, the program further comprising a set of instructionsfor collecting the plurality of metric data sets from a plurality ofsources in the network that collect metric data at different rates. 39.The non-transitory computer readable medium of claim 31, the programfurther comprising sets of instructions for: storing the pluralities ofthe first and second aggregated results in memory; aggregating, at athird rate, the plurality of second aggregated results to generate aplurality of third aggregated results, the third rate being a lower ratethan the first and second rates; and analyzing the plurality of thirdaggregated results in order to monitor the set of network elements.